At the 2024 World Artificial Intelligence Conference, SenseTime Technology released the first domestic "What You See Is What You Get" model, named "Ri Ri Xin 5o". This model offers an interactive experience comparable to GPT-4o, achieving real-time streaming of multi-modal interactions. By integrating cross-modal information such as sound, text, images, and video, it can understand and respond in real-time. For example, it can recognize the name tags worn by staff, determine the venue location,